Opening a business can be exciting, but it should be determined carefully to flourish. There are some categories that needs to be taken account of. Most but not the least, you should think carefully of the followings:
The major purpose of this project, is to propose which type of business would be beneficial to open for a client in a specific area, Toronto. Also, we will locate the similar business in corresponding neighborhoods to determine whether people would stop by near this location.
[entire code can be found in this link: https://github.com/jihea-katie-lee/Coursera_Capstone/blob/master/Code_The_Battle_of_Neighborhoods.ipynb]
To find a best a best type of business and its location, we will use the following resources of information:
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
From the link, we could obtain the postal code, borough, and neighborhood. Using 'geocoder' package, we obtained the latitude and longitude of each neighborhood. Since we will only consider 'Toronto' region, other borough has been ignored.
In this project, we would like to find out which type of restaurants are populated in Toronto. Therefore, we would only focus on the venue type of restaurant.
To do so, we will apply 'K-means clustering' where it can group the data using unsupervised based algorithm. It will allow us to find the similar neighbhorhood with its venue type into one segment.
In order to segment the neighborhoods and explore them, we will need the data of the neighborhoods that exist in near Toronto as well as their latitude and logitude coordinates. We will utilize the Foursquare API where it has a database of millions of places to explore the neighborhoods and segment each category into desired search. From the database, we can obtain following information:
1. Neighborhood
2. Neighborhood's Latitude & Longitude
3. Venue
4. Venue's Latitude & Longitude
5. Venue Category
(Cluster 0, 1, 2, 3 = Purple, Blue, Yellow, Red ; respectively)
After analyzing the data, we could see that there is venue called restaurant where it didn't specify the category of the food. So, we will exclude those data to easily evaluate the categories. Also, some of the clustered data only represent one Borough, which doesn't have specific characteristic of its own. We will also reduce the number of k-cluster value to evaluate further.
(Cluster 0, 1, 2 = Purple, Green, Red ; respectively)
From the above plots, it has been demonstrated that most of the Borough has Japanese Restaurant and Italian restaurant. Among all of the '1st Most Common Venue', the Japanese restaurant were largely placed, then Italian restaurant. As the main purpose of this project was to suggest type of restaurant and its location in Toronto. Using these information, the client should decide which restaurant s/he wants to open.
In this project, the 103 different postal code of the Canada was used to find the corresponding latitude and logitude from dataset. Among them, borough of 'Toronto' was evaluated to find the appropriate type of restaurant to make profit with the request of the client. Using the foursquare API, the top 5 of the most popular venue of the restaurant were evaluated via k-means cluster algorithm. Now, the client could make decisions on which type of restaurants to open in Toronto based on the information we have obtained using data analysis in this project.